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Abstract 

Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 
belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in 
MSHR1 132 and typical 5. aureus is approximately sevenfold greater than the maximum divergence observed in this species 
to date. MSHR1 132 has a small accessory genome, which includes the well-characterized genomic islands, vSAa and vSap, 
suggesting that these elements were acquired well before the expansion of the typical 5. aureus population. Other mobile 
elements show mosaic structure (the prophage (pSa3) or evidence of recent acquisition from a typical 5. aureus lineage 
(SCCmec, \CE6013 and plasmid pMSHR1 132). There are two differences in gene repertoire compared with typical 5. aureus 
that may be significant clues as to the genetic basis underlying the successful emergence of 5. aureus as a pathogen. First, 
MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon 5. aureus its 
characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 
isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of 
MSHR1132. Although common in other staphylococcal species, these elements are very rare within 5. aureus and may 
impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known 5. 
aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates. 
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Introduction 

The bacterial species Staphylococcus aureus is a human 
commensal that commonly colonizes the skin and mucosal 
surfaces. It is also a major human pathogen that can cause 
a variety of disease states, including minor skin and soft tis- 
sue infections and life threatening systemic and pulmonary 
infections. Staphylococcus aureus is a phylogenetically well- 
defined species. Orthologous pairs of housekeeping coding 
sequences (i.e., core genes) exhibit in general <2% nucle- 
otide diversity within 5. aureus, which is at least 10-fold 
lower than the diversity between 5. aureus and the most 
closely related Staphylococcus species (Enright et al. 
2000; Poyart et al. 2001 ; Drancourt and Raoult 2002; Ghe- 
bremedhin et al. 2008). Multilocus sequencing typing 
(MLST)-based studies have revealed that intraspecific 



homologous recombination occurs occasionally within the 
core genome but that this is less frequent than in other bac- 
terial species (Feil et al. 2003; Pearson et al. 2009). Further- 
more, a high level of synteny (gene order) is retained 
between different 5. aureus strains. Superimposed upon 
the stable 5. aureus core genome is a more variable acces- 
sory genome composed of phage, genomic islands, transpo- 
sons, and mobile genetic elements. These accessory genes 
are rapidly lost or acquired by horizontal gene transfer and 
play a key role in adaptation and pathogenicity (Lindsay and 
Holden 2006). 

A community-associated lineage of 5. aureus termed 
"clonal complex 75" (CC75), was recently reported as the 
dominant community-associated methicillin-resistant 5. au- 
reus (MRSA) genotype among indigenous communities in 
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the Northern Territory of Australia (McDonald et al. 2006). 
The presence of this lineage has subsequently been docu- 
mented in south east Asia (Ruimy et al. 2009) and South 
America (Ruimy et al. 2010), and data housed on sau- 
reus.mlst.net also point to its presence in Europe. This sug- 
gests that CC75 is widespread but has until recently escaped 
notice, possibly because of difficulties in typing these strains 
using the standard MLST primers. MLST data revealed an 
average nucleotide divergence between CC75 and typical 
5. aureus of 9.6% over all seven loci (ranging from 5.8% 
at gmk to 15.8% at aroE) (Ng et al. 2009; Ruimy et al. 
2009). Despite this very high level of divergence, CC75 re- 
mains more closely related to typical 5. aureus than the next 
most closely related species {Staphylococcus simiae), and it is 
identical to 5. aureus by 1 6S ribosomal RNA (rRNA; Ng et al. 
2009; Ruimy et al. 2009). Array-based experiments are also 
suggestive of high divergence (Monecke et al. 2010). 

Here we report the complete genome sequence of strain 
MSHR1 1 32, a CC75 clinical isolate from the tropical north of 
the Northern Territory of Australia. These data confirm the 
high level of core nucleotide divergence between CC75 and 
typical 5. aureus. The genome is toward the low end of the 
size range of other sequenced 5. aureus strains, and this is 
due to a small accessory genome that contains no pathoge- 
nicity islands (SaPIs) or novel elements. The presence of the 
genomic islands, vSAa and vSap, suggests that these ele- 
ments were acquired very early in the evolution of 5. aureus. 
Two other notable attributes of the MSHR1 1 32 genome are 
the absence of the operon encoding staphyloxanthin and 
the presence of a CRISPR region. We discuss the possible 
adaptive significance of these findings. 

Methods 

Bacterial Isolates 

MSHR1 1 32 was isolated from the blood culture of an indig- 
enous woman with necrotizing fasciitis at the Royal Darwin 
Hospital (RDH), Darwin, Northern Territory, Australia, in Sep- 
tember 2006. The isolate was resistant to oxacillin but sus- 
ceptible to all other tested non p-lactam antibiotics. 
Additional CC75 isolates (M34, M70, Ml 80, HS2, HS22, 
HS42, HS158, SCC1098, SCC1119, SCC1165, SCC1229, 
sec 1302) from impetigo lesions (McDonald et al. 2006) 
and hospital clinical specimens (Tong et al. 2009) were iden- 
tified using a real-time polymerase chain reaction (PGR) sin- 
gle nucleotide polymorphism (SNP) typing approach 
(McDonald et al. 2006). 

Whole Genome Sequencing 

The whole genome of the 5. aureus isolate MSHR1 1 32 was 
sequenced using both capillary sequencing (on ABI 3730x1 
analysers) and pyrosequencing (on 454 instruments; subsid- 
iary of Roche Diagnostics Corporation, Branford, CT). A total 
of 29,300 high-quality capillary reads were produced mostly 



from two subclone libraries (libraries with inserts in the 
range 2-6 kb using the vector pOTW12 and libraries with 
inserts in the range 5-12 kb using the vector pMAQI- 
Sac_BstX\). The average read length was 650 bp, and these 
reads represented 6.8 x coverage. The 454 sequencing pro- 
duced 73.9 Mb data in reads with an average length of 245 
bp. The assembly of these reads using Newbler 1.1.03.24 
gave 70 contigs >500 bp with a combined length of 
2,753,553 bp in nine scaffolds. 

A combined assembly of the capillary reads, using phrap, 
and the consensus sequences from the 454 assembly 
(which were converted into overlapping 500-bp sequences) 
produced 26 contigs >2 kb with an N50 of 532 kb. A fur- 
ther 2,310 high-quality reads were produced to close gaps 
and to improve the quality of the sequence to finished 
standard. 

The sequences and annotations of the 5. aureus strain 
MSHR1 132 chromosome and plasmid have been deposited 
in the EMBL database under accession numbers FR821777 
and FR821778, respectively. The sequence was annotated 
using Artemis software (Rutherford et al. 2000). Initial cod- 
ing sequence (CDS) predictions were performed using Or- 
pheus (Frishman et al. 1998), Glimmer2 (Delcher et al. 
1999), and EasyGene (Larsen and Krogh 2003) software. 
These predictions were amalgamated, and codon usage, 
positional base preference methods, and comparisons with 
the nonredundant protein databases using BLAST (Altschul 
et al. 1 990) and FASTA (Pearson and Lipman 1 988) software 
were used to refine the predictions. The entire DNA se- 
quence was also compared in all six potential reading frames 
against UniProt, using BLASTx (Altschul et al. 1 990) to iden- 
tify any possible coding sequences previously missed. Pro- 
tein motifs were identified using Pfam (Bateman et al. 

2002) and Prosite (Falquet et al. 2002), transmembrane do- 
mains were identified with TMHMM (Krogh et al. 2001 ), and 
signal sequences were identified with SignalP version 2.0 
(Nielsen et al. 1997). rRNAs were identified using BLASTn 
(Altschul et al. 1990) alignment to defined rRNAs from 
the EMBL nucleotide database; transfer RNAs (tRNAs) were 
identified using tRNAscan (Lowe and Eddy 1997); stable 
RNAs were identified using Rfam (Griffiths-Jones et al. 

2003) . 

Comparative Genomics 

Comparison of the genome sequences was facilitated by us- 
ing the Artemis Comparison Tool (Carver et al. 2005), which 
enabled the visualization of BLASTn and tBLASTx compari- 
sons (Altschul et al. 1990) between the genomes. Ortholo- 
gous proteins were identified as reciprocal best matches 
using FASTA (Pearson and Lipman 1988) with subsequent 
manual curation. Pseudogenes had one or more mutations 
that would prevent correct translation; each of the inactivat- 
ing mutations was subsequently checked against the orig- 
inal sequencing data. 
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Staphylococcus aureus sequences used for connparative 
genonnic analysis were MRSA252 (accession number 
BX571856) (Holden et al. 2004), MSSA476 (BX571857) 
(Holden et al. 2004), MW2 (BA000033) (Baba et al. 
2002), N315 (BA000018) (Kuroda et al. 2001), Mu50 
(BA000017) (Kuroda et al. 2001), Mu3 (AP009324) (Neoh 
et al. 2008), COL (CP000046) (Gill et al. 2005), NCTC8325 
(CP000253) (Gillaspy et al. 2006), USA3000 FPR3757 
(CP000255) (Diep et al. 2006), JH9 (CP000703) (Mwangi 
et al. 2007), Newman (AP009351) (Baba et al. 2008), 
and RF122 (AJ938182) (Herron-Olson et al. 2007). The se- 
quences were also compared with Staphylococcus epider- 
midis RP62a (CP000029) (Gill et al. 2005), 5. epidermidis 
ATC1 2228 (AE01 5929) (Zhang et al. 2003), Staphylococcus 
/7ae/T7o/yf/ci/sJCSC1435(AP006716) (Takeuchi etal. 2005), 
Staphylococcus saprophyticus ATCC 15305 (AP008934) 
(Kuroda et al. 2005), Staphylococcus carnosus TM300 
(AM295250) (Rosenstein et al. 2009), and Staphylococcus 
lugdunensis HKU09-01 (CP001837) (Tse et al. 2010). 

Phylogeny Based on Genetic Content Distance 
Matrix 

As part of the genome alignment process, progressiveMauve 
(Darling et al. 2010) generates a pairwise genome content 
distance matrix that reflects the proportions of the genomes 
that are included in the initial set of local alignments. Be- 
cause the progressiveMauve algorithm only requires that 
a sequence block be included in at least two of the genomes 
for it to be included in the multiple alignment, this analysis 
yields the alignable blocks of sequence between each pair of 
genomes. This extent of the alignable blocks is an excellent 
measure of shared content. The pairwise distances may be 
calculated by dividing the number total extent of the aligned 
blocks by the average of the sizes of genomes. In this way, 
progressiveMauve was used to generate a distance matrix, 
which was converted into a tree using the Neighbor-Joining 
algorithm, implemented in PHYLIP (Felsenstein 1989). 

Phylogeny Based on SNPs 

ProgressiveMauve produces several out files, which have 
data related to the genome alignment. One of these contains 
every SNP in the alignment. Using an ad hoc perl script, we 
parsed this file to get only those SNPs that were located in 
homologous regions across the genomes of MSHR1132, 
MRSA252, USA300_FPR3757, ED98, and 5. epidermidis 
RP62a. With those SNPs, the percentage of identity between 
any two genomes was established, and an identity matrix was 
assembled and converted to a tree using the Neighbor-Joining 
algorithm implemented in PHYLIP (Felsenstein 1989). 

Identification of Genes Under Positive Selection 

We utilized the branch-site models implemented in PAML 
(Yang 2007), which aim to detect positive selection that 



has affected only a few sites on some lineages. The lineages 
under positive selection are known as foreground branches, 
whereas all other branches are the background branches; 
this distinction between foreground and background 
branches should be done a priori (in this case, the branch 
leading to MSHR1 132 was the foreground branch, whereas 
the rest of the branches were background branches). Ac- 
cording to this model, only foreground branches may have 
experienced positive selection. The model assumes four clas- 
ses of sites (here sites mean codons rather than single nucleo- 
tides and "w" equals 6N/6S): Class 0 includes codons that 
are conserved throughout the tree (with 0 < wO < 1, wO 
estimated from the data); Class 1 has codons that are evolv- 
ing neutrally throughout the tree (w1 = 1); Class 2a has 
codons that are conserved on the background branches 
(0 < wO < 1), but on the foreground, branches are under 
positive selection (w2 > 1); and Class 2b includes codons 
that are evolving neutrally (w1 =1) but become under pos- 
itive selection on the foreground branches (w2 > 1). 
Whereas the alternative hypothesis uses the four classes 
of sites as described above thus allowing positive selection 
on the foreground branch, in the null hypothesis, w2 = 1 
is fixed for Classes 2a and 2b. A simple likelihood ratio test 
(LRT) is constructed by comparing the null and alternative 
hypotheses (Zhang et al. 2005). The LRT was applied to 
a set of 1 776 MSHR1 1 32 genes for which orthologues were 
identified in each of the MRSA252, USA300_FPR3757 and 
5. epidermidis RP62a genome sequences in the comparative 
genomic analyses. 

Phenotypic and Genotypic Analysis of Isolates 

Isolates were tested for the production of staphyloxanthin 
by incubation on chocolate agar plates (Oxoid) for 48 h 
at 37 °C. Screening for lukPVwas conducted by real-time 
PCR as previously described (McDonald et al. 2006). Multi- 
locus sequence typing of CC75 isolates was conducted us- 
ing modified primers for aroE (Ruimy et al. 2009), gIpF, gmk, 
tpi, and yqiL (Ng et al. 2009) with respect to the standard 5. 
aureus MLST scheme (Enright et al. 2000). 

Biochemical tests were carried out using a VITEK 2 device 
(bioMerieux— Australia Pty. Ltd., Baulkham Hills NSW, Aus- 
tralia) according to the manufacturer's instructions. 

Results 

General Characteristics of the Genome 

The genome of CC75 isolate MSHR1 1 32 consists of a single 
circular chromosome of 2,762,762 bp containing 2,578 
coding sequences (CDSs) and a single circular plasmid 
(pMSHR! 1 32) of 24,853 bp, containing 23 CDSs. The chro- 
mosome is colinear in comparison with other sequenced 5. 
aureus chromosomes and has a very similar gene content 
(fig. 1). The 16S rRNA sequence is confirmed as identical 
to that of other 5. aureus. The MSHR1132 genome 
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Fig. 1. — Circular diagram of the 5. aureus MSHR1 132 chromosome. Key for the circular diagram (outer to inner): Outer colored segments on the 
gray outer ring represent genomic islands and horizontally acquired DNA (see figure for key); scale (in Mb); annotated CDSs colored according to 
predicted function are shown on a pair of concentric circles, representing both coding strands; 5. aureus reciprocal fasta matches shared with the 5. 
aL/retvs strains (blue): MRSA252, MSSA476, MW2, N315, Mu50, Mu3, COL, NCTC8325, USA3000 FPR3757, JH9, Newman, RF122, LGA251, TW20; 
Staphylococcus reciprocal fasta matches shared with the staphylococcal species (purple): 5. epidermidis, S. saprophyticus, S. haemolyticus, S. carnosus, 
reciprocal fasta matches shared with Macrococcus caseolyti cus (green); CDS functions: dark blue, pathogenicity/adaptation; black, energy metabolism; 
red, information transfer; dark green, surface associated; cyan, degradation of large molecules; magenta, degradation of small molecules; yellow, 
central/intermediary metabolism; pale green, unknown; pale blue, regulators; orange, conserved hypothetical; brown, pseudogenes; pink, phage and IS 
elements; and gray, miscellaneous. 



possesses a novel spa type (repeat numbers 259, 31, 17, 17, 
1 7, 22, 17,17, 23, 1 7, and 22), a distinct coa type (type XII), 
agr type I, and MLST sequence type (ST) 1850. The capsule 
gene cluster of 15 genes (SAMSHR1 132_01230 to 
SAMSHR1 132_01380) closely resembles the capsule gene 
cluster that encodes serotype 8 in other 5. aureus strains. 

The accessory genome is small. Readily identified ele- 
ments consist of the genomic islands a and p, a type IVa 
Staphylococcal cassette chromosome mec {SCCmec) in- 



serted into orfX, a single integrative conjugative element 
(ICE), a single putative transposon, a single prophage, 
and a plasmid pMSHR1 132. Staphylococcus aureus patho- 
genicity islands (SaPIs) were not detected. 

MSHR1132 is Divergent from Other S. aureus 
Throughout the Core Genome 

Sequencing of the MLST loci and a small sample of other 
housekeeping genes had previously demonstrated that 
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Fig. 2.— Nucleotide divergence plot of MRSA252 compared with USA300_FPR3757, MSHR1132 and 5. epidermidis RP62a. (A) Plot of the 
nucleotide divergence against gene position between the genomes of MRSA252 and USA300_FPR3757 (purple), MSHRl 132 (red) and 5. epidermidis 
RP62a (green). (B) The divergence between MRSA252 and MSHRl 132 is approximately seven times greater than that between MRSA252 and 
USA300_FPR3757 and slightly less than half the divergence between 5. aureus and 5. epidermidis. 



these genes were significantly diverged in CC75 isolates 
compared with other 5. aureus (Ruimy et al. 2009; Ng 
et al. 2009). To refine the phylogenetic position of 
MSHR1 132, the identities between 1498 orthologous CDSs 
in the core genomes of MRSA252 and MSHRl 132, 
USA300_FPR3757, and 5. epidermidis RP62A were deter- 
mined. MRSA252 and USA300_FPR3757 were chosen be- 
cause they are from different phylogenetic groups within 
5. aureus (CC30/group la and CC8/group 2, respectively), 
so their divergence represents the upper range of the diver- 
gence between non-CC75 5. aureus strains (Cooper and Feil 
2006). The divergence of MSHRl 1 32 from MRSA252 is ap- 
proximately seven times greater than that between 
MRSA252 and USA300_FPR3757 and slightly less than half 
the divergence between 5. aureus and 5. epidermidis (fig. 2). 
There is no clear evidence for recent core genome horizontal 
gene transfer between MSHRl 132 and any of the other 
strains. 

In order to confirm the phylogenetic position of CC75 rel- 
ative to typical 5. aureus and 5. epidermidis, we constructed 
a tree based on nucleotide divergence of the core genes and 
a second tree based on differences in genome content. Al- 
though the topologies of the two trees are identical and 
each places CC75 as intermediate between typical 5. aureus 
and 5. epidermidis with respect to time of divergence, there 
are large differences in the relative branch lengths (fig. 3; 



note that the two trees are drawn to different scales, and 
branch lengths are given on the tree). In the nucleotide 
divergence tree, the branch lengths within the typical 5. au- 
reus clade are much shorter than the branch leading to CC75. 
This reflects the fact that the divergence between unrelated 
lineages of typical 5. ai/reL/s genomes is -2%, whereas CC75 
is —10% diverged from typical 5. aureus. However in the ge- 
nome content tree, the branch lengths leading to the typical 
5. aureus and CC75 are much more similar. For example, the 
distance between MRSA252 and ED98 is 85% of the distance 
between MSHRl 132 and ED98. The simplest explanation of 
these data is rapid evolution of the accessory genome, with 
constraints both on total genome size and on core genome 
size that result in accessory genome differences between 
strains rapidly becoming saturated. 

Variation within the CC75 Lineage 

In order to gauge the diversity within the CC75 lineage, we 
characterized 12 further CC75 isolates recovered from the 
northern part of the Northern Territory of Australia using 
a modified MLST scheme. The Australian isolates were re- 
solved by MLST into five STs. A neighbor joining tree that 
included the CC75 ST ST1223 present on the 5. aureus 
MLST database resolved the STs into three distinct lineages 
represented by STs 1824, 1848, and 1849, STs 1223 
and 1823, and ST 1850 (that includes MSHRl 132) 
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Fig. 3. — Nucleotide divergence and genome content neighbor joining trees. The neighbor joining tree was generated from distances matrices 
based on (A) genome content and (B) nucleotide divergence between MRSA252, USA300_FPR3757, ED98, MSHR1132, and 5. epidermic! is RP62a. 



(fig. 4). The average divergence between these STs was 
approximately 0.5%, and the maximum divergence, between 
STs 1850 and 1223, was 0.94%. This is an approximately 
10-fold higher level of divergence than is seen within typical 
5. aureus clonal complexes by MLST (e.g., CCS and CC30). 
Thus, this lineage exhibits a level of divergence more typical 
of an entire staphylococcal species. There is no evidence for 



high rates of recombination between these isolates, either us- 
ing the phi test (implemented in SplitsTree4.0) (Huson and 
Bryant 2006) (which was not significant) or the Recombination 
Detection Program suite of programs (Martin and Rybicki 
2000) (which did not identify any recombination events). 

Originally ST1223 was identified in isolates recovered 
from Cambodia (Ruimy et al. 2009); however, more recently 
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Fig. 4. — Variation within the CC75 lineage. Analysis of six CC75 
multilocus sequence types by neighbor joining, which resolves three 
groups. 

Isolates from this ST have been reported at a high frequency 
In a sample of asymptomatlcally carried Isolates from Way- 
ampl Amerindians living In a remote Amazonian village In 
French Guiana In South America (Rulmy et al. 2010). Al- 
though ST1223 Is distinct from most of the Isolates from 
Australia, It Is very closely related to ST1823 that was 
represented by the Australian Isolate HS2. 

Six Isolates representing the three lineages (M34 
[ST1824], MSHR1132 [ST1850], SCC1165 [ST1848], 
SCC1302 [ST1850], HS2 [ST1823], and HS22 [ST1849]) 
were subjected to phenotypic analysis and Identification us- 
ing a VITEK2 device. All were Identified as 5. aureus with 
a confidence of either "very good" or "excellent." This Is es- 
sentially Identical to the findings of Monecke et al. (2010). 
Staphylococcus aureus ATCC29213 Is the quality control 
strain recommended by the VITEK2 manufacturer. The 
CC75 Isolates differed from this only In their being positive 
for trehalose utilization and variable for urease production 
and 0/129 resistance. Staphylococcus aureus AJCC292^3 Is 
negative, negative and positive for these tests, respectively. 
GenBank searches Indicated that 5. aureus commonly car- 
ries genes for urease production and trehalose utilization, 
so for these results do not correspond with any obvious ge- 
netic differences between CC75 and other 5. aureus. It was 
concluded that commonly used biochemical tests will not 
reliably discriminate CC75 Isolates from other 5. aureus. 

Evidence for Positive Selection 

In order to Identify genes, which may experience positive 
selection In CC75 and which may therefore provide clues 
as to the adaptation of this lineage, we used the branch-site 
models Implemented In PAML. A total of 1776 orthologous 
genes common to the genome sequences of MSHR1132, 
USA300_FPR3757, MRSA252 (5. aureus), and 5. epidermi- 
dis RP62a were Identified and used In this analysis. The 
MSHR1132 sequence was defined as the foreground 
branch. Twenty-three of the 1,776 genes were Identified 
as having experienced positive selection In the branch lead- 
ing to the CC75 lineage (table 1). These Include five genes 
encoding proteins associated with the cell envelope (Includ- 
ing two putative membrane proteins, Riley's category 4) and 
the agr (accessory gene regulator) protein B and a number of 
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metabolic genes (Riley's category 3). The analysis was re- 
peated twice using MRSA252 or USA300 as the foreground 
branch, and this Identified 33 and 29 genes under positive 
selection respectively (supplementary tables SI and S2, 
Supplementary Material online). The relatively low number 
of genes Identified to be under positive selection In CC75 
Is consistent with the suggestion of lower propensity to cause 
disease than typical 5. aureus. Interestingly, we note that 
although 11 cell membrane genes (Riley's category 4) In 
MRSA252 were detected as positively selected, only 2 cell 
membrane genes were Identified In USA300. However, nine 
metabolic genes appear to be under positive selection In 
USA300. This analysis therefore suggests quite different adap- 
tive paths between CC75, the hospital-associated MRSA252 
and the community-acquired USA300, and also challenges 
the widely held assumption that the evolution of metabolic 
genes Is predominantly driven by purifying selection. 

CC75 Is Nonpigmented 

A defining feature of 5. aureus Is the production of the mem- 
brane-bound triterpenold carotenold pigment staphyloxan- 
thin that confers upon 5. aureus Its characteristic golden 
color (Marshall and Wllmoth 1981). Staphyloxanthin pro- 
tects against oxidative stress and scavenging reactive oxygen 
substances, making 5. aureus more resistant to hydrogen 
peroxide, superoxide radical, hydroxyl radical, hypochlorlde, 
and neutrophil killing (Liu et al. 2005; Claudltz et al. 2006). It 
may also contribute to Intracellular survival after phagocyto- 
sis (Olivier et al. 2009). A 5. aureus ACrtM mutant Is more 
susceptible to oxidant killing, has Impaired neutrophil sur- 
vival, and Is less pathogenic In a mouse subcutaneous 
abscess model (Liu et al. 2005). Inhibition of staphyloxanthin 
synthesis In vivo resulted In Increased susceptibility of 5. 
aureus to killing by human blood and to Innate Immune 
clearance In a mouse Infection model (Liu 2008). 

Staphyloxanthin Is encoded by CDSs SAR2642-SAR2647 
In the MRSA252 chromosome, and orthologs have been 
found In all other 5. aureus genome sequences to date. Or- 
thologous CDSs were not found In the MSHR1 1 32 genome. 
The ability of 126 of our collection of CC75 Isolates to pro- 
duce staphyloxanthin was tested by extended Incubation on 
chocolate agar. None produced pigment, with the colonies 
being a brilliant white color. In contrast, randomly selected 
non-CC75 5. aureus clinical Isolates from our collection 
were all clearly pigmented (fig. 5). It was therefore con- 
cluded that CC75 Is nonpigmented. 

The orfX Region of MSHR1132 Encompasses a Type 
IVa SCCmec Element, Putative hsd Genes, and 
CRISPR/cas Genes 

MSHR1132 carries a 24, 181 -bp type IVa SCCmec element, 
typical of community-acquired MRSA (fig. 6). As Is seen with 
other staphylococci. It Is Inserted Into orfX. The element Is 
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Table 1 

Identification of Genes under Positive Selection in MSHR1 132 



MSHR1132 CDS 


Riley's Category^ 


Predicted Function 


MSHR1132_ 


_01070 


4 


putative membrane protein 


MSHR1132_ 


_02940 


4 


lipase precursor 


MSHR1132_ 


_03360 


7 


putative GTP-binding protein 


MSHR1132. 


_04100 


4 


putative membrane protein 


MSHR1132. 


_04330 


7 


tetrapyrrole (corrin/porphyrin) methylase family protein 


MSHR1132_ 


_06960 


0 


conserved hypothetical protein 


MSHR1132_ 


_10340 


7 


cell division initiation protein, putative 


MSHR1132_ 


_11510 


3 


glutamine synthetase 


MSHR1132_ 


_11740 


3 


threonine synthase 


MSHR1132_ 


_12220 


1 


putative oligopeptide transporter ATPase 


MSHR1132_ 


_12530 


3 


dihydrolipoamide succinyltransferase E2 component of 2-oxoglutarate 








dehydrogenase complex 


MSHR1 1 32 

IVI^I ll\l I ^ Z- _ 


13190 


4 


cell surface elastin binding protein 


MSHR1132_ 


_13700 


7 


putative peptidase 


MSHR1132_ 


_13730 


7 


lipoate-protein ligase A protein 


MSHR1132_ 


_14220 


6 


heat-inducible transcription repressor 


MSHR1132_ 


_15910 


7 


FtsK/SpolllE family protein 


MSHR1132_ 


_18500 


1 


putative sodium transport protein 


MSHR1132_ 


_18580 


4 


accessory gene regulator protein B 


MSHR1132_ 


_18760 


3 


acetolactate synthase large subunit 


MSHR1132_ 


_21220 


0 


hypothetical protein 


MSHR1132_ 


_22600 


3 


putative glycerate kinase 


MSHR1132. 


_22710 


7 


2-dehydropantoate 2-reductase 


MSHR1132. 


_23700 


1 


putative ferrous iron transport protein B 


Riley's category 


Function 


Number of genes 


7 




Not classified (includes putative assignments) 


7 


4 




Cell envelope 


5 


3 




Metabolism of small molecules 


5 


1 




Cell processes 


3 


0 




Unknown function, no known homologs 


2 


6 




Regulatory functions 


1 



Note. — A total of 1,776 orthologous genes common to MSHR1132, MRSA252, USA300_ FPR3757, and 5. epidermidis RP62a were analyzed using the branch-site models 
implemented in PAML with the MSHR1 1 32 sequence was defined as the foreground branch. Twenty-three of the 1 ,776 genes were identified as having experienced positive selection 
in the branch leading to the CC75 lineage. 

^ Genes were classified according to cellular function (78). 



most similar to the SCCmec in isolate JKD61 59, a ST93 MRSA 
(Chua et al. 2010), with only three SNPs over the entire 
length, and with a 55-bp duplication present in JKD6159. 
ST93 is a dominant strain of Panton Valentin Leukocidin 
(PVL)+ community-acquired MRSA in Australia (Nimmo 
et al. 2006) having recently emerged in northern or eastern 
Australia. It is undergoing radiation and currently coexists with 
CC75 in the north of the Northern Territory (Tong et al. 201 0). 
Recent horizontal transfer of this SCCmec element between 
CC75 and ST93 would simply explain their close similarity. The 
MSHR1 1 32 SCCmec is also very similar to the SCCmec in the 
Japanese MRSA isolate JCSC4744 (ST391 and CC91) (Kishii 
et al. 2008); there is only one SNP in the 1 8 kb that is available 
for this region in JCSC4744. There are also only seven SNPs 
compared with the SCCmec element in the USA300 isolate 
TCH1516 (Highlander et al. 2007). 

Three hsd genes or pseudogenes are located immediately 
adjacent to SCCmec and distal to the chromosomal origin of 



replication. Genes encoding restriction modification systems 
have previously been observed in this location in both 5. au- 
reus and 5. epidermidis (Mongkolrattanothai et al. 2004; 
Gill et al. 2005; Noto et al. 2008). They have also been mo- 
bilized as part of SCCmec type V element (Ito et al. 2004). 
However, the lisd clusters at this location in different strains 
and species are only very distantly related to each other, and 
much closer relatives of these genes can be found at other 
locations. This implies multiple parallel events of lisd locus 
insertion into orfX. 

Immediately adjacent to the lisd gene cluster in 
MSHR1 132 is the clustered regularly interspaced short pal- 
indromic repeat (CRISPR)/CRISPR-associated (Cas) (CRISPR/ 
cas) locus (fig. 6). CRISPR/cas has been found in a variety 
of bacteria and Archaea, where it mediates defense against 
mobile genetic elements (Horvath and Barrangou 2010). 
The only 5. aureus lineage that has previously been noted 
to contain a CRISPR/cas locus is the livestock-associated 



888 Genome Biol. Evol. 3:881-895. doi:10.1093/gbe/evr078 Advance Access publication August 2, 201 1 



Staphylococcus aureus Lineage Lacking Staphyloxanthin 



GBE 




Fig. 5. — CC75 isolates lack staphyloxanthin. The 5. aureus clinical 
isolate SCC1007 (ST93) (Tong et al. 2009) (left) produces staph- 
yloxanthin, whereas the CC75 isolate MSHR1 132 (right) does not. 

ST398 done (Golding et al. 201 0). CRISPR/cas is present in at 
least some strains of 5. epidermidis (Gill et al. 2005) where it 
has been demonstrated to limit plasmid conjugation 
(Marraffini and Sontheimer 2008) and 5. lugdunensis (Tse 
et al. 2010) but has not been observed to date in 5. sapro- 
phyticus, S. haemolyticus, S. carnosus, or Macrococcus 



caseolyticus. In general, CRISPR/cas elements have been 
identified in -40% of bacterial genomes sequenced to date 
(van der Oost et al. 2009). 

CRISPR loci typically consist of repeat sequences inter- 
spersed with variable spacer sequences, which are generally 
segments of DNA captured from viral or plasmid sequences. 
These acquired and heritable DNA spacer sequences are uti- 
lized by the Cas-encoded proteins in a defense system 
against mobile genetic elements, although intriguingly 
the CRISPR/cas elements are themselves associated with 
plasmids and phage (Touchon and Rocha 201 0). The CRISPR 
loci are most often located adjacent to the cas CDSs, which 
encode a heterogeneous family of proteins including nu- 
cleases, helicases, and polymerases (Horvath and Barrangou 
2010). Forty-five different protein families have been iden- 
tified within the cas CDSs within prokaryotic genomes (Haft 
et al. 2005). The number and type of cas CDSs can vary con- 
siderably and appear to be linked to the sequence and 
length of the repeat unit (Haft et al. 2005). 

The cas CDSs in MSHR1 1 32 are homologous and syntenic 
with those found in the sequenced genomes of 5. epidermi- 
dis RP62A and 5. lugdunensis HKU09-01 and have similar 
levels of sequence divergence as other orthologous CDSs 
in these comparisons. Thus, the typical nucleotide diver- 
gence in the CRISPR/cas of CC75 suggests that there is 
no need to evoke recent horizontal acquisition to explain 
the presence of this element, the simplest explanation being 
that CRISPR/cas was present in a common ancestor of 5. ep- 
idermidis, S. lugdunensis, and 5. aureus and has been lost in 



SCCmec IVa 



CRISPR/Cas locus 



MSHR1132 



SCCmecV , , CRISPR/Cas locus 



fragment fragments 
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Fig. 6. — Structure of SCCmec and CRISPR/Cas region. Diagram of tlie structure of tine region including SCCmec and CRISPR/Cas in MSHR1 1 32, 5. 
aureus 08BA02176 (ST398), 5. epidermidis RP62a, and 5. aureus MRSA252. 
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conventional 5. aureus since it diverged from CC75. How- 
ever, the patchy distribution of CRISPR/cas throughout the 
Staphylococcus genus as a whole and within single species 
(notably the presence of the element in 5. aureus strain 
ST398 and its absence in 5. epidermidis ATC12228) points 
to the mobility of this element. We also note that there is 
strong evidence for horizontal gene transfer of CRISPR/ 
cas in other genera (Haft et al. 2005; Touchon and Rocha 
2010). 

In summary, there is conflicting evidence as to whether 
the presence of CRISPR/Cas is an ancestral or acquired char- 
acteristic in MSHR1 1 132. Although sequence similarities 
suggest that it is ancestral, its patchy distribution and close 
association with SCCmec in the staphylococci suggest that 
it is either mobile or easily able to be mobilized by other 
elements. 

The CRISPR spacer sequences are of interest because they 
provide a record of past challenges from mobile genetic el- 
ements and thus potentially indicate whether the presence 
of CRISPR in MSHR1 1 32 has the potential to impede the ac- 
quisition of genes with clinical significance. The MSHR1 1 32 
CRISPR/cas locus possesses two CRISPR repeat/spacer ar- 
rays; the left contains six spacers (L1-L6) and the right four 
spacers (R1-R4) (supplementary table S3, Supplementary 
Material online). Spacers L5, L6, and R2 show similarity to 
small hypothetical genes observed in multiple members 
of the siphoviridae that are known to infect 5. aureus 
and similar prophages in 5. aureus genomes. Spacer L6 is 
of particular interest because it is identical to phage that 
carry the lukPV genes that encode the PVL toxin. We there- 
fore screened 1 26 isolates of our collection of CC75 isolates 
for the presence of the lukPy genes, and as expected, all 
were negative. Spacers L4 and R1 are similar to 5. aureus 
phages in the myoviridiae family, whereas spacer R4 is sim- 
ilar to 5. aureus plasmid sequences. The "WBG" plasmids 
listed (supplementary table S3, Supplementary Material on- 
line) are resident in 5. aureus isolated from the remote north 
of Western Australia, which is consistent with the northern 
Australian origin of MSHR1132. 

Other Mobile Elements and Genomic Island Regions 

The genomic islands, vSAa and vSap, are present at the 
same locations as in other 5. aureus strains (fig. 1). These 
regions were examined to determine whether they had 
been acquired independently in MSHR1132 or inherited 
from a common ancestor. The vSaa genomic island in 
MSHR1 1 32 is approximately 22 kb in size and includes CDSs 
for 7 exotoxins, type I restriction-modification system M and 
S subunits, and an array of 1 1 lipoproteins (fig. 7). As there 
are a number of differences relative to vSAa in other 5. au- 
reus strains, including the presence of a novel hsdS gene se- 
quence, we argue that vSAa should be classified into a new 
type V. The exotoxin and lipoprotein sequences have be- 
tween 84-97% identity with MRSA252, and no significantly 



higher levels of similarity were found with other 5. aureus 
strains. As this level of divergence is typical for orthologous 
genes elsewhere in the genome, these data indicate a long- 
term stable association with this island rather than a recent 
independent acquisition. 

The vSap genomic island in MSHR1 1 32 is approximately 
25 kb in size and includes CDSs for six enterotoxins, type I 
restriction-modification system S and M subunits, and nu- 
merous hypothetical proteins (fig. 7). There is an absence 
of a serine protease gene cluster present in other 5. aureus 
isolates. Similar to vSAa, based on these differences and the 
presence of a novel hsdS gene sequence, we propose that 
vSAp is classified into a new type IV. The enterotoxin array is 
syntenic to MRSA252, and the genes have sequence iden- 
tities of 92-98%. Again this level of divergence is similar to 
the genome as a whole; hence, there is no compelling rea- 
son to evoke recent acquisition by horizontal transfer. 

The 42,445-bp (pSa3(MSHR1 1 32) prophage is inserted at 
the same site as (pSa3 insertions in other 5. aureus strains 
within the phospholipase C precursor gene (SAMSHR1 1 32_ 
17840). (pSa3(MSHR1132) is largely syntenic with other 
(pSa3 elements but lacks the gene encoding enterotoxin 
type A (SAR2043). The nucleotide sequence identity be- 
tween (pSa3(MSHR1 132) CDSs and homologous CDSs in 
other 5. aureus (pSa3 propahge is highly variable from gene 
to gene, consistent with a highly mosaic structure. For ex- 
ample, the most similar public nucleotide database entry to 
the (pSa3(MSHR1 132) portal protein gene (SAMSHR1 132_ 
18060) is from 5. aureus isolates JH1 and JH9, and there is 
only 75% identity. In contrast, the 756-bp autolysin-encod- 
ing gene (SAMSHR1 1 32_1 7880) has just one SNP with ho- 
mologues in a variety of 5. aureus isolates. In contrast to the 
genomic islands discussed above, these data suggest that 
there have been independent insertions of prophage or 
homologous recombination of prophage genes since 
CC75 diverged from other 5. aureus. 

There is a 14,419-bp ICE integrated into, and disrupting, 
a putative membrane CDS (SAMSHR1 132_1 5220). This 
entire element is -99% identical to \CE6013 (Smyth 
and Robinson 2009) at the nucleotide level, with the ex- 
ception that ICE60/3mshrii32 contains two additional 
CDS (SAMSHR1132_15320 and SAMSHR1 1 32_1 5330), 
which encode putative membrane proteins of unknown 
function. \CE6013 insertions are found in several locations 
in 5. aureus, and ICE60y3MSHRii32 is inserted at the same lo- 
cation as \CE6013 in several sequenced strains (fig. 1). An 
\CE6013 element essentially identical to ICE60/3mshri 132/ in- 
cluding the additional CDS, has been identified in plasmid 
pSK53 from 5. aureus SKI 700 that was isolated in Mel- 
bourne, Australia, in 1975 (GenBank accession number 
GQ91 5270). To date, this structure has not been found else- 
where. The high level of identity between ICE60y3MSHRii32 
and ICE60/3 in other 5. aureus strains and the specific sim- 
ilarity with a plasmid-borne form mean that it is likely that 
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Fig. 7. — Structure of the MSHRl 132 vSaot and vSap genomic islands Tlie MSHR1 132 genome contains novel vSaot and vSap genomic islands, 
which were assigned types V and IV, respectively. 



Insertion of ICE60/3mshri 132 took place after the divergence 
of CC75 and other 5. aureus. 

There Is a putative transposon between bases 2,147,694 
and 2,157,171. It Is 98.3% Identical and entirely syntenic 
with a putative transposon that has only been previously ob- 
served In the genome of 5. aureus S0385. S0385 belongs to 
the pig-assoclated ST398 but Is derived from a case of hu- 
man endocarditis (Schljffelen et al. 2010). Similar to 
ICE60/3, the high level of similarity with the structure In 
5. aureus S0385 suggests that Its Insertion postdates the di- 
vergence of CC75 and other 5. aureus. 

MSHRl 132 contains a single plasmid (pMSHR1132) of 
24,853 bp. The backbone of the plasmid Is similar to other 
plasmlds found In 5. aureus, Including pTW20_2, which was 
Identified In the TW20 (ST239) genome (Holden et al. 201 0). 
The plasmid encodes a number of proteins Involved In an- 
tibiotic and antiseptic resistance. An enoyl-acyl-carrler-pro- 
teln reductase (NADH) 2 (Fabl2) Is found In an Indel region of 
the plasmid. The protein Is associated with hexachlorophene 
and triclosan resistance. This plasmid copy Is highly similar to 
the chromosomal copy, but notably, the plasmid copy has 
substitutions associated with resistance (e.g., F204C; Fan 
et al. 2002). The plasmid also encodes the antiseptic resis- 
tance protein QacA In a divergent operon with the transcrip- 
tional regulator QacR and a beta-lactamase (BIaZ) on 
a remnant of a Tn552 transposon. 

Discussion 

The analysis of the MSHRl 132 genome sequence confirmed 
the phylogenetic divergence between CC75 and other 
5. aL/rei/slndlcatedbymultllocussequenceanalyslsandarray- 
based experiments (Ng et al. 2009; Rulmy et al. 2009; 
Monecke et al. 201 0). Averaging the data from 1 ,498 orthol- 
ogous CDSs revealed that MSHRl 1 32 Is approximately seven 
times more divergent from other 5. aL/rei/s than major phylo- 



genetic groups within non-CC75 5. aureus a^e diverged from 
each other and that the divergence between CC75 and other 
5. aureus Is slightly under half the divergence between 
5. aureus and 5. epidermidis. Non-CC75 5. aureus have 
been very successful and has undergone a global radiation, 
and It Is clear that CC75 diverged from the progenitor of this 
radiation long before this radiation began. 

The virulence of CC75 Is far from understood. Essentially, 
the only systematic study was reported by Tong et al. (201 0) 
and was based upon clinical Isolates from RDH In the Aus- 
tralian Northern Territory. The spectrum of disease caused by 
CC75 was similar to other community-associated 5. aureus 
lineages. However, CC75 was less likely to cause abscesses 
and sepsis. Also, this report outlined evidence that CC75 Is 
underrepresented In clinical Isolates from RDH, given Its 
abundance In superficial skin lesions In some communities 
In the RDH catchment. Our working model Is that CC75 
has a greater association with minor skin Infections and less 
propensity to cause serious and systemic Infections than 
other 5. aureus. We speculate that this reflects an ancestral, 
less Invasive, and less versatile virulence phenotype as com- 
pared with typical 5. aureus. The retention of this phenotype 
may be a cause of the small number of genes In MSHRl 1 32 
under positive selection for variation. The golden pigment 
staphyloxanthin Is known to facilitate 5. aureus Invasive In- 
fections by providing oxidative defense (Liu et al. 2005; 
Claudltz et al. 2006; Liu et al. 2008; Olivier et al. 2009). 
The connection between the CC75 virulence phenotype 
and Its lack of staphyloxanthin Is an Interesting topic for fu- 
ture research. 

The MSHRl 132 genome contains a locus specifying 
a CRISPR/cas system. It Is Intriguing that CRISPR/cas and 
the hsd genes are so close to each other and may have both 
Inserted Into orfX (fig. 6). This Implies that orfX has attracted 
not only the SCCmec mobile elements but also DNA 
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encoding two separate systems both devoted to defense 
against mobile elements but that are themselves mobile. 
Clusters of this nature have been recently noted and termed 
"defense islands," although the molecular mechanisms and 
evolutionary pressures underlying this are not currently un- 
derstood (Makarova et al. 2009). 

The CRISPR spacers provide a record of past encounters 
with bacteriophage and plasmids closely related to elements 
known to infect or reside in 5. aureus and in some cases are 
identical to highly conserved and widely distributed bacte- 
riophage genes, including those that encode the PVL toxin. 
The accessory genome of MSHR1 1 32 is at the small end of 
the range found in 5. aureus and contains no SaPls. It is pos- 
sible that this is a function of the presence of C RISPR/cas and 
that CC75 has a less plastic genome than other 5. aureus. 
This is consistent with the notion that CC75 has an ancestral 
phenotype that is less impacted by the accessory genome. 
However, although it is tempting to regard CC75 as an an- 
cestral relic and the absence of pigment and presence of 
CRISPR as primitive characteristics, a recent loss of pigmen- 
tation genes and acquisition of CRISPR cannot be ruled out 
at this time. Also, the presence of SCCmec, an ICE element, 
a prophage, a transposon, and a plasmid in the MSHR1 132 
genome demonstrates that this lineage is not completely 
resistant to genetic invaders. 

The currently known global distribution of the CC75 lin- 
eage poses something of a mystery. Although the presence 
of this lineage in south east Asia can be broadly explained by 
geographical proximity to northern Australia, the discovery 
of ST1 223 at a high frequency in carriage in a remote village 
in French Guiana is far more difficult to explain. Although 
we note paleontological evidence that the first South Amer- 
ican settlers arrived from Australia (Neves and Hubbe 2005), 
we remain cautious in linking these hypotheses with the cur- 
rent distribution of CC75 in the absence of more extensive 
sampling. Furthermore, it is striking that the two MLST STs 
ST1223 and ST1823 are almost identical, differing at only 
two sites (0.06%), despite the former being recovered from 
French Guiana and the latter being an Australian isolate. 
This would argue against the view that 5. aureus popula- 
tions at these sites have remained isolated from each other 
and from the rest of the world since the earliest colonization 
of South America between 1 0,000 and 50,000 years ago. It 
is also noteworthy in this context that many of the globally 
widespread clonal complexes of typical 5. aureus are also 
found both in French Guiana and in northern Australia, 
which again argues against the isolation of these popula- 
tions. Thus, the current data remain insufficient to draw 
strong conclusions relating the distribution of CC75 with 
ancient patterns of human migration but rather suggest that 
the global scale dissemination patterns of CC75 may resem- 
ble typical 5. aureus. It is noteworthy that that although very 
closely related CC75 STs may be found on opposite sides of 
the world, the MSHR1132 SCCmec element is virtually 



identical to that found in another community-associated 
5. aureus lineage that has a similar geographical distribution 
to Australian CC75. Also, the MSHR1132 ICE60/3 allotype 
as also only been found in Australia. 

Non-CC75 5. aureus is abundant and globally distributed. 
On the basis of universally high core genome ortholog sim- 
ilarities, it arose from a point source that existed a long time 
after the divergence of the ancestor of this point source 
from any other known 5. aureus species. The identification 
of CC75 as a group allied to but outside the global radiation 
of non-CC75 5. aureus has the potential to greatly assist the 
determination of the genetic basis for the global dominance 
of typical 5. aureus and the associated burden of human dis- 
ease. A complete answer will require more CC75 and non- 
CC75 5. aureus genome sequences. However, this study not 
only revealed that CC75 isolates lack staphyloxanthin but 
also that MSHR1132 possesses genomic islands, vSaa and 
vSap. Sequence similarities indicate that the presence of 
these elements is the ancestral state for CC75 and non- 
CC75 5. aureus. Therefore, the simplest model is that the 
acquisition of these elements long predated and did not ini- 
tiate the radiation of non-CC75 5. aureus. However, we also 
note that these regions appear to undergo homologous re- 
combination at much higher rates than core genes within 
the typical 5. aureus population as phylogenies based on 
the allelic variation present in these islands is highly discor- 
dant with MLST data (Tsuru et al. 2006; Baba et al. 2008; 
Tsuru and Kobayashi 2008). These two lines of evidence 
can be reconciled by arguing that the recent radiation of 
typical 5. aureus population has been accompanied by an 
increase in recombination within these regions. Indeed, it 
is possible that genes encoded on the islands themselves de- 
termine the degree to which they recombine. The islands 
contain genes for a type I restriction-modification system 
that have contributed to the subsequent evolution of dis- 
tinct 5. aureus lineages and controlled the degree of hori- 
zontal transfer between these lineages (Waldron and 
Lindsay2006). Consistentwith this, MSHR1 132 hasa unique 
hsdS gene sequence. 

MSHR1132, and the CC75 lineage in general, presents 
a very interesting and challenging case study in bacterial sys- 
tematics, particularly given the recent interest in identifying 
operational and conceptual species definitions (Achtman 
and Wagner 2008). The "lumpers" will point to the fact that 
CC75 is phenotypically essentially identical to typical 5. au- 
reus, with the exception of a lack of toxin production and 
possibly lower virulence, and this phenotypic similarity is 
consistent with the relatively conserved noncore genome 
and the absence of novel mobile elements. Furthermore, 
1 6S rRNA data would, similar to classic biochemical criteria, 
unquestionably place CC75 within the main 5. aureus pop- 
ulation. On the other hand, the "splitters" will argue a strong 
case for the promotion of CC75 to a separate species or sub- 
species. In contrast to 1 6S rRNA, the divergence of the core 
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genonne is striking and places CC75 ainnost nnidway 
between typical 5. aureus and 5. epidermidis. CC75 is there- 
fore a clearly distinct lineage fronn that which has undergone 
recent global radiation to beconne the typical 5. aureus. 
Following the proposal by Gevers et al. (2005) that core 
genonne divergence can be a powerful way to delineate spe- 
cies, this observation alone should be sufficient to promote 
this lineage to at least new subspecies. The recent ennphasis 
on the use of average nucleotide identity is also consistent 
with this view (Richter and Rossello-Mora 2009). Further- 
nnore, the MLST data for 13 CC75 isolates revealed a nnax- 
innum level of divergence of almost 1 % even within this very 
limited sample. This is a respectable degree of diversity for 
a separate species and confirms that this lineage is far 
broader than a single "clonal complex." Gene flow has also 
frequently been considered an important criterion for spe- 
cies delineation (Dykhuizen and Green 1991; Fraser et al. 
2007). The presence of a CRISPR element, the conserved 
noncore, and the failure to find evidence for recombination 
both within the CC75 core genome and between CC75 and 
typical 5. aureus all point to the genetic isolation of this lin- 
eage. Such taxonomic dilemmas will likely become increas- 
ingly common as genetic sampling of the microbiosphere 
intensifies. For example, a recent study of Escherichia coli 
diversity has revealed the CV lineage that is phenotypically 
E.coli but as phylogenetically divergent from other E. coli as 
is Escherichia fergusonii (Walk et al. 2009). 

Finally, the apparent restricted distribution of this lineage 
and possible lower virulence (due to lack of pigment produc- 
tion and other virulence factors) points to an ecological dif- 
ference. Although further work is needed to confirm this 
final point, in the end, we side with the splitters and argue 
that the high degree of core genome divergence outweighs 
the apparent phenotypic similarity and 1 6S rRNA uniformity. 
Serious consideration should therefore be given to formally 
reclassifying the CC75 lineage as a new species, named 
5. argenteus (silver staph), reflective of the absence of 
staphyloxanthin. 

Supplementary Material 

Supplementary tables SI -S3 are available at Genome Biology 
and Evolution online (http://www.gbe.oxfordjournals.org/). 
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